I am a machine learning researcher working on sequential decision-making in feedback loops, i.e., the multi-armed bandit problem. With backgrounds on online optimization and interdisciplinary collaborations, I develop algorithms with theoretical guarantees that have important real-world

نویسنده

Kwang-Sung Jun

چکیده

Modern practitioners use machine learning to guide their decisions in applications ranging from marketing and targeting to policy making and clinical trials. To embrace automated decision-making in closed loops, it is crucial to make decisions safely and efficiently, as poor decisions waste human resources and result in social and economic costs. To this end, I propose to build a comprehensive research program on closed-loop decision problems with an emphasis on humans in the loop. Specifically, I will investigate novel ways of interacting with humans in realistic settings, develop mathematical algorithms and theory for complex models, and apply them to salient problems in biology, psychology, and economics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Blinded Bandit: Learning with Adaptive Feedback

We study an online learning setting where the player is temporarily deprived of feedback each time it switches to a different action. Such model of adaptive feedback naturally occurs in scenarios where the environment reacts to the player’s actions and requires some time to recover and stabilize after the algorithm switches actions. This motivates a variant of the multi-armed bandit problem, wh...

متن کامل

Online combinatorial optimization with stochastic decision sets and adversarial losses

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable c...

متن کامل

A Survey of Preference-Based Online Learning with Bandit Algorithms

In machine learning, the notion of multi-armed bandits refers to a class of online learning problems, in which an agent is supposed to simultaneously explore and exploit a given set of choice alternatives in the course of a sequential decision process. In the standard setting, the agent learns from stochastic feedback in the form of real-valued rewards. In many applications, however, numerical ...

متن کامل

An Optimal Online Method of Selecting Source Policies for Reinforcement Learning

Transfer learning significantly accelerates the reinforcement learning process by exploiting relevant knowledge from previous experiences. The problem of optimally selecting source policies during the learning process is of great importance yet challenging. There has been little theoretical analysis of this problem. In this paper, we develop an optimal online method to select source policies fo...

متن کامل

Efficient Online Learning under Ban- dit Feedback

In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlated arms. Particularly, we investigate the case when the expected rewards are a Lipschitz function of the arm and extend these results to bandits with arbitrary structure that is known to the decision maker. In these settings, we derive problem specific regret lower bounds and propose both an asymp...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

I am a machine learning researcher working on sequential decision-making in feedback loops, i.e., the multi-armed bandit problem. With backgrounds on online optimization and interdisciplinary collaborations, I develop algorithms with theoretical guarantees that have important real-world

نویسنده

چکیده

منابع مشابه

The Blinded Bandit: Learning with Adaptive Feedback

Online combinatorial optimization with stochastic decision sets and adversarial losses

A Survey of Preference-Based Online Learning with Bandit Algorithms

An Optimal Online Method of Selecting Source Policies for Reinforcement Learning

Efficient Online Learning under Ban- dit Feedback

عنوان ژورنال:

اشتراک گذاری